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Abstract 

Careful tuning of a regularization parameter is indispensable in many machine learning tasks because 
it has a significant impact on generalization performances. Nevertheless, current practice of regularization 
parameter tuning is more of an art than a science, e.g., it is hard to tell how many grid-points would 
be needed in cross-validation (CV) for obtaining a solution with sufficiently small CV error. In this 
paper we propose a novel framework for computing a lower bound of the CV errors as a function of 
the regularization parameter, which we call regularization path of CV error lower bounds. The proposed 
framework can be used for providing a theoretical approximation guarantee on a set of solutions in the 
sense that how far the CV error of the current best solution could be away from best possible CV error 
in the entire range of the regularization parameters. We demonstrate through numerical experiments 
that a theoretically guaranteed choice of a regularization parameter in the above sense is possible with 
reasonable computational costs. 


1 Introduction 

Many machine learning tasks involve careful tuning of a regularization parameter that controls the balance 
between an empirical loss term and a regularization term. A regularization parameter is usually selected 
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by comparing the cross-validation (CV) errors at several different regularization parameters. Although its 
choice has a significant impact on the generalization performances, the current practice is still more of an 
art than a science. For example, in commonly used grid-search, it is hard to tell how many grid points we 
should search over for obtaining sufficiently small CV error. 

In this paper we introduce a novel framework for a class of regularized binary classification problems 
that can compute a regularization path of CV error lower bounds. For an £ € [0,1], we define e-approximate 
regularization parameters to be a set of regularization parameters such that the CV error of the solution 
at the regularization parameter is guaranteed to be no greater by e than the best possible CV error in the 
entire range of regularization parameters. Given a set of solutions obtained, for example, by grid-search, the 
proposed framework allows us to provide a theoretical guarantee of the current best solution by explicitly 
quantifying its approximation level e in the above sense. Furthermore, when a desired approximation level e 
is specified, the proposed framework can be used for efficiently finding one of the e-approximate regularization 
parameters. 

The proposed framework is built on a novel CV error lower bound that can be represented as a function 
of the regularization parameter, and this is why we call it as a regularization path of CV error lower 
bounds. For computing a path, no special optimization algorithm is needed. We only need to have a 
finite number of solutions obtained by any algorithms. It is thus easy to apply our framework to common 
regularization parameter tuning strategies such as grid-search or Bayesian optimization. Furthermore, the 
proposed framework can be used not only with exact optimal solutions but also with sufficiently good 
approximate solutions, which is computationally advantageous because completely solving an optimization 
problem is often much more costly than obtaining a reasonably good approximate solution. 

Our main contribution in this paper is to show that a theoretically guaranteed choice of a regularization 
parameter in the above sense is possible with reasonable computational costs. To the best of our knowledge, 
there is no other existing methods for providing such a theoretical guarantee on CV error that can be used 
as generally as ours. Figure I illustrates the behavior of the algorithm for obtaining e: = O.I approximate 
regularization parameter (see §5 for the setup). 

Related works Optimal regularization parameter can be found if its exact regularization path can be com¬ 
puted. Exact regularization path has been intensively studied [8, 15], but they are known to be numerically 
unstable and do not scale well. Furthermore, exact regularization path can be computed only for a limited 
class of problems whose solutions are written as piecewise-linear functions of the regularization parameter 
[22]. Our framework is much more efficient and can be applied to wider classes of problems whose exact 
regularization path cannot be computed. This work was motivated by recent studies on approximate regu¬ 
larization path [13, 11, 12, 20]. These approximate regularization paths have a property that the objective 
function value at each regularization parameter value is no greater by e than the optimal objective function 
value in the entire range of regularization parameters. Although these algorithms are much more stable and 
efficient than exact ones, for the task of tuning a regularization parameter, our interest is not in objective 
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Figure 1: An illustration of the proposed framework. One 
of our algorithms presented in §4 automatically selected 39 
regularization parameter values in [10“^,10^], and an upper 
bound of the validation error for each of them is obtained by 
solving an optimization problem approximately. Among those 
39 values, the one with the smallest validation error upper 
bound (indicated as ★ at C = 1.368) is guaranteed to be e(= 

0.1) approximate regularization parameter in the sense that the 
validation error for the regularization parameter is no greater 
by £ than the smallest possible validation error in the whole 
interval [10“^, 10^]. See §5 for the setup (see also Figure 3 for 
the results with other options), 
function values but in CV errors. Our approach is more suitable for regularization parameter tuning tasks 
in the sense that the approximation quality is guaranteed in terms of CV error. 

As illustrated in Figure 1, we only compute a finite number of solutions, but still provide approximation 
guarantee in the whole interval of the regularization parameter. To ensure such a property, we need to 
introduce a novel CV error lower bound that is sufficiently tight and represented as a monotonic function 
of the regularization parameter. Although several CV error bounds (mostly for leave-one-out CV) of SVM 
and other similar learning frameworks exist (e.g., [26, 16, 7, 17]), none of them satisfy the above required 
properties. The idea of our CV error bound is inspired from recent studies on safe screening [9, 28, 21, 19, 27] 
(see Appendix A for the detail). Furthermore, we emphasize that our contribution is not in presenting a new 
generalization error bound, but in introducing a practical framework for providing a theoretical guarantee 
on the choice of a regularization parameter. Although generalization error bounds such as structural risk 
minimization [25] might be used for a rough tuning of a regularization parameter, they are known to be 
too loose to use as an alternative to CV (see, e.g., §11 in [23]). We also note that our contribution is not 
in presenting new method for regularization parameter tuning such as Bayesian optimization [24], random 
search [1] and gradient-based search [6]. As we demonstrate in experiments, our approach can provide a 
theoretical approximation guarantee of the regularization parameter selected by these existing methods. 

2 Problem Setup 

We consider linear binary classification problems. Let {{xi,yi) G x {—1, l}}i6[n] be the training set where 
n is the size of the training set, d is the input dimension, and [n] := {1,... ,n}. An independent held-out 
validation set with size n' is denoted similarly as {{x[,y[) £ R‘^ x {—1, l}}ig[„']. A linear decision function 
is written as f(x) = w^x, where w G R.'^ is a vector of coefficients, and ^ represents the transpose. We 
assume the availability of a held-out validation set only for simplifying the exposition. All the proposed 
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methods presented in this paper can be straightforwardly adapted to a cross-validation setup. Furthermore, 
the proposed methods can be kernelized if the loss function satisfies a certain condition. In this paper we 
focus on the following class of regularized convex loss minimization problems: 


w% := arg min 


ie[n] 


( 1 ) 


where C > 0 is the regularization parameter, and || • || is the Euclidean norm. The loss function is denoted as 
£ : { — 1,1} X We assume that £(■,■) is convex and subdifferentiable in the 2nd argument. Examples 

of such loss functions include logistic loss, hinge loss, Huber-hinge loss, etc. Eor notational convenience, we 
denote the individual loss as ii{w) := £{yi,w^Xi) for all i € [n]. The optimal solution for the regularization 
parameter C is explicitly denoted as Wq . We assume that the regularization parameter is defined in a finite 
interval [Ci, Cu\, e.g., Ci = 10“^ and Cu = 10^ as we did in the experiments. 

For a solution w the validation error^ is defined as 


Ev{w) ^ < 0), (2) 

n ^ ' 

ie[n'] 

where /(•) is the indicator function. In this paper, we consider two problems. In the first problem, given 
a set of (either optimal or approximate) solutions Wq_^ , • ■ •, at T different regularization parameters 
Cl,..., Ct € [Ci, Cu], we compute the approximation level e such that 


min Eylwr ) — E* < e, where E* := min Eyiwr)- (3) 

Cte{Ci....,CT} ^ ^ CG[C,.C„] ^ 

In the second problem, we find an e-approximate regularization parameter within an interval C S [C;,C„], 

which is defined as an element of the following set 


C{e) := [C € [Cl, Cu] 


Eu{w*c)-E:<e]. 


Both of these two problems can be solved by using our proposed framework for computing a path of validation 
error lower bounds. 


3 Validation error lower bounds as a function of regularization 
parameter 

In this section, we derive a validation error lower bound which is represented as a function of the regularization 
parameter C. Our basic idea is to compute a lower and an upper bound of the inner product score x[ 
for each validation input x[,i £ [n'], as a function of the regularization parameter C. For computing the 
bounds of x[, we use a solution (either optimal or approximate) for a different regularization parameter 
C^C. 

^ For simplicity, we regard a validation instance whose score is exactly zero, i.e., w^x[ = 0, is correctly classified in (2). 
Hereafter, we assume that there are no validation instances whose input vector is completely 0, i.e., x[ = 0, because those 
instances are always correctly classified according to the definition in (2). 
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3.1 Score bounds 


We first describe how to obtain a lower and an upper bound of inner product score w’^ x\ based on an 
approximate solution at a different regularization parameter C ^ C. 


Lemma 1. Let Wq be an approximate solution of the problem (1) for a regularization parameter value C 
and be a subgradient of £i at w = wc such that a subgradient of the objective function is 


g{w^) := wc + C ^ 

ie[n] 


Then, for any C > 0, the score x^,i € [n'\, satisfies 


w*J x'i>LB{w*J x'i\w^) 


a{wc,x'i) - ^{P(wc,x'i) + -i{g{wc),xl))C,\i C > C, 
-I3{w^,xi) + ^{a{w^,x'i) + 6{g{w^),x'f))C,ii C <C, 


(4) 


(5a) 


where 


w*J x[<U B{w*J x'|wc) := 


-j5{wc,x'i) + i^{a{w^,x'f) + 6{g{w^),xi))C,ii C > C, 
(^{wc, x'i) - i(/3(u;c, x'f) + i{g{wc), x'))C', if C < (7, 


(5b) 


a{w*Q,Xi) 

P(w*^,xf) 


^(Ik6lllk*ll +w*Q x[) > 0, '){g{wc),x'i) ■■= 

^(Ik^lllkill -w*Jx'i) > 0, 5{g(wc),Xi) := 


^i\\9{wc)\\\\xi\\ +g{wcVx'f) > 0 , 

^i\\9{wc)\\\\xi\\ - giw^yx'i) > 0 . 


The proof is presented in Appendix A. Lemma 1 tells that we have a lower and an upper bound of the 
score Wc^ xi for each validation instance that linearly change with the regularization parameter C. When Wq 
is optimal, it can be shown that (see Proposition B.24 in [2]) there exists a subgradient such that giw^) = 0, 
meaning that the bounds are tight because 'y{g{w^),xi) = 5{g(w^),xl) = 0. 

Corollary 2. When C = C, the score w*^xi,i € [n'], for the regularization parameter value C itself satisfies 


Wgx'i>LB{w’^'xi\w^) = w^Xi--f{g{wg),Xi), wj'x'< C/B(w^' x'i\w^) = w^x'i+6{g{w^),Xi) 


The results in Corollary 2 are obtained by simply substituting C = C into (5a) and (5b). 


3.2 Validation Error Bounds 

Given a lower and an upper bound of the score of each validation instance, a lower bound of the validation 
error can be computed by simply using the following facts: 

y' = +1 and UB(w^x^\wg) < 0 =^> mis-classified, (6a) 

y[ = —1 and LB(w"q x'j\wg) > 0 =4> mis-classified. (6b) 

Furthermore, since the bounds in Lemma 1 linearly change with the regularization parameter C, we can 
identify the interval of C within which the validation instance is guaranteed to be mis-classified. 
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Lemma 3. For a validation instance with y[ = -\-l, if 


C <C < 


-C or 




—,C<C<C, 


a{wc^ K) + K) x'i) + -figiwc), xf) 

then the validation instance (x'^,y'f) is mis-classified. Similarly, for a validation instance with y[ = 


-1, 


C <C < 


a{wc,x'f) 


C or 


P{wc,x'fi 


f3iwc,xi)+-f{g{wc),x'^) 
then the validation instance {xi,y'f) is mis-classified. 


a{wc,x'^)-^6{g{wc),xi) 


-C<C<C, 


This lemma can be easily shown by applying (5) to (6). 

Using Lemma 3, the lower bound of the validation error is represented as a function of the regularization 
parameter C in the following form. 

Theorem 4. Using an approximate solution for a regularization parameter C, the validation error 
Ey{wf;) for any C > 0 satisfies 


E„{w*c) > LB{E„{w*c)\wc) := 


V rfrcrr v rf 

\a(<x^,x'f)FKg{ 






C 




C<C'<c) I. 


Vi' 

Theorem 4 is a direct consequence of Lemma 3. The lower bound (7) is a staircase function of the 
regularization parameter C. 

By setting C = C, we can obtain a lower and an upper bound of the validation error for the regularization 
parameter C itself, which are used in the algorithm as a stopping criteria for obtaining an approximate 
solution Wq. 

approximate solution Wq, the validation error Ey{wE) satisfies 


Corollary 5. Given an 

E,{w}.) > LB{E4w*^)\wc) 


c),x'f)<0)+ I{wlxl--f{g{w^),x'i) > 0)Y 

V'.:— — ! / 


Vy'=+1 

E^iw}.) < UB{E,{w}.)\wc) 

= ^ l{'>^};^'i-li9{wc),Xi)>Q)+ Y l{w};x', + S{g{wc),Xi) <0)\. 

^ Vi/i=+i ' 

- -algorithm- 


(8a) 


(8b) 


4 Algorithm 

In this section we present two algorithms for each of the two problems discussed in §2. Due to the space lim¬ 
itation, we roughly describe the most fundamental forms of these algorithms. Details and several extensions 
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Algorithm 1: Computing the approximation level e from the given set of solutions 
Input: Cl, Cu, W := 

2: LB{E;) ^ minee[c',.c„] { LB{Ey{w*)\wc^)} 

Output: £ = - LB{E*) 


Algorithm 2: Finding an e approximate regularization parameter with approximate solutions 
Input: , Cl, Cu, £ 

1 : t^l,Ct^ Cl, ^ Cl, ^ 1 
2 : while Ct < Cu do 

3: <— solve (1) approximately for C = Ct 

4: Compute UB{Eu{w*g )I^C't) by (8b). 

5 : if UB{Eu{w*^J\wcJ < then 

6 : ^UB{Eu{w%)\wr), ^ Ct 

7 : end if 

8: Set Ct+1 by (10) 

9: t i — t \ 

10: end while 
Output: G C{e). 


of the algorithms are presented in supplementary appendices B , C and D. 

4.1 Problem 1: Computing the approximation level g from a given set of solu¬ 
tions 

Given a set of (either optimal or approximate) solutions ..., obtained e.g., by ordinary grid-search, 
our hrst problem is to provide a theoretical approximation level £ in the sense of (3)^. This problem can be 
solved easily by using the validation error lower bounds developed in §3.2. The algorithm is presented in 
Algorithm 1, where we compute the current best validation error in line 1, and a lower bound of the 

best possible validation error E* := TtmiCfz\Ct,Cu\ ^w(^c) bne 2. Then, the approximation level £ can be 
simply obtained by subtracting the latter from the former. We note that LB{E*), the lower bound of E*, 
can be easily computed by using T valuation error lower bounds LB{Eu{wq)\wq^), t = 1,... ,T, because 
they are represented as staircase functions of C. 

^ When we only have approximate solutions , ■ ■ ■, , Eq. (3) is slightly incorrect. The first term of the l.h.s. of (3) 

should be UB{E,j{wg^)\wg^). 
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4.2 Problem 2: Finding an e-approximate regnlarization parameter 

Given a desired approximation level e such as e = 0.01, our second problem is to find an £-approximate 
regularization parameter. To this end we develop an algorithm that produces a set of optimal or approximate 
soluitons , •. •, such that, if we apply Algorithm 1 to this sequence, then approximation level would 
be smaller than or equal to e. Algorithm 2 is the pseudo-code of this algorithm. It computes approximate 
solutions for an increasing sequence of regularization parameters in the main loop (lines 2-11). 

Let us now consider t**' iteration in the main loop, where we have already computed t — 1 approximate 
solutions ..., for Ci < ... < Ct-i- At this point, 

^best UB{Ey{w% )|W (5 ), 

is the best (in worst-case) regularization parameter obtained so far and it is guaranteed to be an e- 
approximate regularization parameter in the interval [Ci,Ct] in the sense that the validation error, 

a))®®* := min UB{Ey{w*^ )|n ;(5 ), 

is shown to be at most greater by e than the smallest possible validation error in the interval [C/, Ct]. However, 
we are not sure whether can still keep e-approximation property for C > Ct- Thus, in line 3, we 

approximately solve the optimization problem (1) at C = Ct and obtain an approximate solution Wq^. Note 
that the approximate solution Wq^ must be sufficiently good enough in the sense that UB{Ey(w*^ ~ 

LB{Ey{w*^ sufficiently smaller than e (typically O.le). If the upper bound of the validation error 

UB{Ey{w*g )|n;( 5 j is smaller than A))®"*, we update A()®®‘ and C'"®®* (lines 5-8). 

Our next task is to find Ct+i in such a way that G'^®®* is an e-approximate regularization parameter in 
the interval [Ci,Ct+i]. Using the validation error lower bound in Theorem 4, the task is to find the smallest 
Ct+i > Ct that violates 

-LB{E,{w*c)\wc^)<e, VGe[Gt,G„], (9) 

In order to formulate such a Ct+i, let us define 

V ■.= {%& [n']\y[ = +l,UB(w*^^x[\wc^) < Ol.A/"— {i G [n']\y[ = -l,LB{w*^^x[\wc^) > 0}. 
Furthermore, let 

\a{wc^,x'^) + 5{g{w^^),x'^) C iev t/3{wc^,x'^ +-f{g{w^J,x'i} U*gN’ 

and denote the fc*^-smallest element of T as fc**'(r) for any natural number k. Then, the smallest Ct+i > Ct 
that violates (9) is given as 


G*+i^(K(TH(K(u;^J|^^J-Ar‘+£)J+l)“'(r). 


(10) 




# of elements in a set of solutions T # of elements in a set of solutions T # of elements in a set of solutions T 


liver-disorders (D2) 


ionosphere (D3) 


australian (D4) 


Figure 2: Illustrations of Algorithm 1 on three benchmark datasets (D2, D3, D4). The plots indicate how 
the approximation level e improves as the number of solutions T increases in grid-search (red), Bayesian 
optimization (blue) and our own method (green, see the main text). 



(a) £: = 0.1 without tricks (b) e = 0.05 without tricks (c) e = 0.05 with tricks 1 and 2 


Figure 3: Illustrations of Algorithm 2 on ionosphere (D3) dataset for (a) op2 with £ = 0.10, (b) op2 with 
e = 0.05 and (c) op3 with e = 0.05, respectively. Figure 1 also shows the result for op3 with e = 0.10. 

5 Experiments 

In this section we present experiments for illustrating the proposed methods. Table 2 summarizes the datasets 
used in the experiments. They are taken from libsvm dataset repository [4]. All the input features except 
D9 and DIO were standardized to [—1,1]^. For illustrative results, the instances were randomly divided into 
a training and a validation sets in roughly equal sizes. For quantitative results, we used 10-fold CV. We 
used Huber hinge loss (e.g., [5]) which is convex and subdifferentiable with respect to the second argument. 
The proposed methods are free from the choice of optimization solvers. In the experiments, we used an 
optimization solver described in [18], which is also implemented in well-known liblinear software [10]. Our 
slightly modified code (for adaptation to Huber hinge loss) is provided as a supplementary material, and it 
will be put in public domain after the paper is accepted. Whenever possible, we used warm-start approach, 

® We use D9 and DIO as they are for exploiting sparsity. 
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i.e., when we trained a new solution, we used the closest solutions trained so far (either approximate or 
optimal ones) as the initial starting point of the optimizer. All the computations were conducted by using a 
single core of an HP workstation Z800 (Xeon(R) CPU X5675 (3.07GHz), 48GB MEM). In all the experiments, 
we set Ci = 10“^ and Cu = 10^. 

Results on problem 1 We applied Algorithm 1 in §4 to a set of solutions obtained by 1) grid-search, 
2) Bayesian optimization (with expected improvement acquisition function), and 3) our own method that 
exploits information on the GV error lower bound available during the search process. Figure 2 illustrates 
the results on three datasets, where we see how the approximation level e in the vertical axis changes as 
the number of solutions (T in our notation) increases. The red plots indicate the results of grid-search. As 
we increase the grid points, the approximation level e was tended to be improved. The blue plots indicate 
the results of Bayesian Optimization (BO). Since BO tends to focus its search on a small region of the 
regularization parameter, it was difficult to tightly bound the approximation level. The green plots indicate 
the result of the third option, where we sequentially computed a solution whose validation error lower bound 
is smallest based on the information obtained so far. The results suggest that this naive approach seems to 
offer slight improvement from grid-search. 

Results on problem 2 We applied Algorithm 2 to benchmark datasets for demonstrating theoretically 
guaranteed choice of a regularization parameter is possible with reasonable computational costs. Besides the 
algorithm presented in §4, we also tested a variant described in supplementary Appendix B. Specifically, we 
have three algorithm options. In the first option (opl), we used optimal solutions {w^ }te[T] for computing 
CV error lower bounds. In the second option (op2), we instead used approximate solutions {wc^}t^iT]- In 
the last option (op3), we additionally used speed-up tricks described in supplementary Appendix B. We 
considered four different choices of e G {0.1,0.05,0.01,0}. Note that e = 0 indicates the task of finding the 
exactly optimal regularization parameter. In some datasets, the smallest validation errors are less than 0.1 
or 0.05, in which cases we do not report the results (indicated as “Ey < 0.05” etc.). In trickl, we initially 
computed solutions at four different regularization parameter values evenly allocated in [10“^, 10^] in the 
logarithmic scale. In trick2, the next regularization parameter Ct+i was set by replacing e in (10) with 1.5^ 
(see supplementary Appendix B). 

For the purpose of illustration, we plot examples of validation error curves in several setups. Figure 3 
shows the validation error curves of ionosphere (D3) dataset for several options and e. 

Next, we report the results on computational costs in GV setups. Table 1 shows the number of optimiza¬ 
tion problems we actually solved in the algorithm (which is denoted as T), and the total computation time 
in seconds. The computational costs of the methods mostly depend on T. As is evident from the algorithm 
description in §4, T gets smaller as e increases. Two tricks in supplementary Appendix B seem to be helpful 
in most cases for reducing T. In addition, we see the advantage of using approximate solutions by comparing 
the computation times of opl and op2, although approximate solutions can be only used for e ^ 0. Over- 
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Table 1: Computational costs. For each of the three options and £ G {0.10,0.05,0.01,0}, the number of 
optimization problems solved (denoted as T) and the total computational costs (denoted as time) are listed. 
Note that, for op2, there are no results for £ = 0. 




opl 

op2 

op3 


opl 

o 

p2 

op3 



(using inj.) 

(using wg) 

(using tricks) 


(using Wg) 

(using wg) 

(using tricks) 

e 


T 

time 

(sec) 

T 

time 

(sec) 

T 

time 

(sec) 


T 

time 

(sec) 

T 

time 

(sec) 

T 

time 

(sec) 

0.10 


30 

0.068 

32 

0.031 

33 

0.041 


92 

1.916 

93 

0.975 

62 

0.628 

0.05 


68 

0.124 

70 

0.061 

57 

0.057 


207 

4.099 

209 

2.065 

123 

1.136 

0.01 


234 

0.428 

324 

0.194 

205 

0.157 


1042 

16.31 

1069 

9.686 

728 

5.362 

0 


442 

0.697 

N.A. 

383 

0.629 


4276 

57.57 

N.A. 

2840 

44.68 

0.10 


221 

0.177 

223 

0.124 

131 

0.084 


289 

8.492 

293 

5.278 

167 

3.319 

0.05 


534 

0.385 

540 

0.290 

367 

0.218 


601 

16.18 

605 

9.806 

379 

6.604 

0.01 


1503 

0.916 

2183 

0.825 

1239 

0.623 


2532 

57.79 

2788 

35.21 

1735 

24.04 

0 


10939 

6.387 

N.A. 

6275 

3.805 


67490 

1135 

N.A. 

42135 

760.8 

0.10 


61 

0.617 

62 

0.266 

43 

0.277 


72 

0.761 

74 

0.604 

66 

0.606 

0.05 


123 

1.073 

129 

0.468 

73 

0.359 


192 

1.687 

195 

1.162 

110 

0.926 

0.01 

uo 

600 

4.776 

778 

0.716 

270 

0.940 


1063 

8.257 

1065 

6.238 

614 

4.043 

0 


5412 

26.39 

N.A. 

815 

6.344 


34920 

218.4 

N.A. 

15218 

99.57 

0.10 


27 

0.169 

27 

0.088 

23 

0.093 


134 

360.2 

136 

201.0 

89 

74.37 

0.05 


64 

0.342 

65 

0.173 

47 

0.153 


317 

569.9 

323 

280.7 

200 

128.5 

0.01 


167 

0.786 

181 

0.418 

156 

0.399 


1791 

2901 

1822 

1345 

1164 

657.4 

0 


342 

1.317 

N.A. 

345 

1.205 


85427 

106937 

N.A. 

63300 

98631 

0.10 


62 

0.236 

63 

0.108 

45 

0.091 


E„ < 0.10 

Ey < 0.10 

Ey < 0.10 

0.05 


108 

0.417 

109 

0.171 

77 

0.137 


Ey < 0.05 

Ey < 0.05 

Ey < 0.05 

0.01 


421 

1.201 

440 

0.631 

258 

0.401 


157 

81.75 

162 

31.02 

114 

36.81 

0 


2330 

4.540 

N.A. 

968 

2.451 


258552 

85610 

N.A. 

42040 

23316 


all, the results suggest that the proposed algorithm allows us to find theoretically guaranteed approximate 
regularization parameters with reasonable costs except for £ = 0 cases. For example, the algorithm found 
an £ = 0.01 approximate regularization parameter within a minute in 10-fold CV for a dataset with more 
than 50000 instances (see the results on DIO for e = 0.01 with op2 and op3 in Table 1). 


Table 2: Benchmark datasets used in the experiments. 

I II 11 -Ic I 


dataset name 


sample size 


input dimension 


dataset name 


sample size 


input dimension 


D1 

D2 

D3 

D4 

D5 


heart 


liver-disorders 


ionosphere 

australian 


diabetes 


270 

345 

351 

690 

768 


13 
6 

34 

14 
8 


D6 

D7 

D8 

D9 

DIO 


german.numer 
svmguideS 
svmguidel 
ala 


w8a 


1000 

1284 

7089 

32561 

64700 


24 

21 

4 

123 

300 


6 Conclusions and future works 

We presented a novel algorithmic framework for computing CV error lower bounds as a function of the 
regularization parameter. The proposed framework can be used for a theoretically guaranteed choice of a 
regularization parameter. Additional advantage of this framework is that we only need to compute a set of 
sufficiently good approximate solutions for obtaining such a theoretical guarantee, which is computationally 
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advantageous. As demonstrated in the experiments, our algorithm is practical in the sense that the compu¬ 
tational cost is reasonable as long as the approximation quality e is not too close to 0. An important future 
work is to extend the approach to multiple hyper-parameters tuning setups. 


A Proof of Lemma 1 


In this section we prove Lemma 1. First we present two propositions which are used of proving Lemma 1. 
Proposition 6. Consider the following general problem: 

min (f)(z) s.t. z € Z, (11) 

Z 

where f) : Z is a subdijferentiable convex function and Z C is a convex set. Then a solution z* is 
the optimal solution of (11) if and only if there exists a subgradient S d(j){z*) such that 

C{z*-z)<0, \lz€Z, 

where d(f>{z*) is the set of all subgradients of convex function 4> at z = z*. 

See, for example, Proposition B.24 in [2] for the proof of Proposition 6. 

Proposition 7. Let p,q be arbitrary d-dimensional vectors and r > 0 be an arbitrary positive constant. 
Then, the solutions of the following optimization problem can he explicitly obtained as follows: 


p^q ~ Iblk = min z 

zGR‘‘ 

s.t. 

Ik 

-qr<r\ 

(12) 

P^q + Iblk = max p^z 
zGR‘‘ 

s.t. 

Ik 

-gf <rk 

(13) 


Proof of Proposition 7. Using a Lagrange multiplier A > 0, the problem (12) is rewritten as 

min p^z s.t. \\z — 

zeRt* 

= min max (p^z -|- A(||z — glP < r^)) 

= max ( — Ar^ -|- min (A||z — p|p p^z)) 

= max H{X) := ( - Ar^ - -\-p~'^q), 

A>o 4A 

where A is strictly positive because the constraint ||p — g|p < is strictly active at the optimal solution. By 
letting dH{X)/dX = 0, the optimal A is written as 

A* := = argmax H{X). 


Substituting A* into T[{X), 


p^q- \\p\\r = 


max H(X). 

A>0 


The upper bound of p^z in (13) can be shown similarly. 
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Proof of Lemma 1. From Proposition 6, the optimal solution Wq satisfies 


wc + C^i^{'Wc) {wc-wc)<0, 


(14) 


where ^i{wQ) is a subgradient of £i at w = wf, for any i G [n] . 

Since from £i is convex for any i G [n] and the definition of a subgradient, we have the following two 
inequalities: 

£i{wc) > £^{wc) + U{wcV{wc - wc)- 

£i{wc) > £z{wc) + ^z{wc)^{Wc - Wc). 

Combining these two inequalities, we have 

^i{wh)^{wh - Wc) > ^i{wc)'^{wc - Wc)- 

Substituting (15) into (14), 


w*(7(wc - Wc) + C* X! - i^c) ^ 0. 

ie[n] 


From (4), 


Substituting (17) into (16), 


^*(^6) = ^ - ^c) ■ 


ie[n] 




wh^ {w*c -Wc) + -^ [giwc) - Wc) {Wc -Wc) <0 
w*c-\{w- ^{g{w) -w)) < w + ^ig{w) -w) ) 


The lower bound LB{wq x'^\wq) is given by solving the following optimization problem: 


min w’^ x) 

toj 


s.t. 


w*c - - ^{g{w) - w)) <( 


Using Proposition 7, the solution of (18) is given as 


< 


LBiw*Jx)\wc) = ^x'i^ (w - ^ig{w) - w)) - ||a;'|| ^(w + ^{giw) - w)) 


C l\2 

w + -^ig{w) - w) 


C, 


c 

^x'i^ {w-^{g{w)-w)) - ^||a;'||( 1 - ^ ||w|| + '^\\g{w 


0 !{wc,x'i) - ^{£3{wc,x)) +j{g{wc),x)))C, iiC >C, 
-P{wc,x)) + i(a(W( 5 , x') + 5{g(wc), x'f))C, if C < C. 

Similarly, the upper bound UB{w’^x)\wc) is given by solving the following optimization problem 

wh - - ^(.giw) - w)) < w + ^{giw)-w) ) , 


max x'i 


s.t. 


(15) 


(16) 


(17) 


(18) 


(19) 
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Figure 4: An illustrative example of Algo¬ 
rithm 2 behavior. The blue real lines repre¬ 
sent the validation error lower bound. The 
red chained lines and green dashed lines 
indicate the current best validation error 
upper bound and — e, respec¬ 

tively. If the blue validation error lower 
bound falls below the green ones, the val¬ 
idation error can be smaller by e than the 
current best. In such a case, the algo¬ 
rithm computes the next approximate solu¬ 
tion, and update the validation error lower 
bound based on the new approximate so¬ 
lution. The plot is an enlarged view of the 
region from C 13 to C 17 in Figure 3 (a) in 

§5. 


and the solution of (19) is given as 

UB{w*Jxi\wc) = (w - ^{giw) - w)j + ||a;'|| + ^{g{w) - w)^ 

> {w-^{g(w)-w)^ + ^lk'||( 


1 - 


C 

C 


C 


II5HII) 


-l3{wc,Xi) + ^(a(wc,x',) + S{g{wc),x',j)C, 
aiwc,x-) - + -/(g(wc),x'i))C, 


if C > C, 
if C < (7. 


Remark 8 . We note that the idea of using Propositions 6 and 1 for proving Lemma 1 is inspired from 
recent studies on safe screening [9, 28, 21, 19, 27]. Safe screening has been introduced in the context of 
sparse modeling. It allows us to identify sparse features or instances before actually solving the optimization 
problem. A key technique used in those studies is to bound Lagrange multipliers at the optimal solution 
(Lagrange multiplier values at the optimal solution tell us which features or instances are active or non-active) 
in somewhat similar way as we did in §5. Our main contribution is to borrow this idea for representing a 
validation error lower bound as a function of the regularization parameter, and show that it can be used for 
finding an approximately optimal regularization parameter with theoretical guarantee. 
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B Details of the speed-up tricks for finding an ^-approximate reg¬ 
ularization parameter 

In this appendix, we first describe two modifications of the basic algorithm for finding an e-approximate 
regularization parameter presented in §4.2 for further speed-up. 

Trickl The efficiency of the algorithm depends on how far one can step forward in each iteration. We see 
in (10) that the step size Ct+i — Ct is large if the current minimum validation error upper bound ill))®®* is 
small. In other words, the step size will be small until we have sufficiently small iH))®®*. It suggests that, if we 
can find small enough i?))®®* at an earlier stage of the algorithm, we can reduce the total computational cost 
of the algorithm. In order to find sufficiently small £l))®®* as early as possible, we propose a simple heuristic 
approach, where we hrst roughly search over the entire range by a rough grid search. 

Trick2 Our next modihcation for speed-up is to use 

for computing the validation error lower bound in C G [Ct, C't+i]- It provides a tighter validation error lower 
bounds than using LB{Ey{w^)\wQ^) alone, meaning that larger step might be allowed in each iteration. 
However, we cannot actually compute LB{Ey{wQ)\wQ^^^) before we fix Ct+i- We thus propose a simple 
trial-and-error approach. Specifically, we step forward a little bit further than (10) when we select the next 
Ct+i- After we fix Ct+i, we compute an approximate solution and then check whether the validation 

error E^iw^) is not smaller by e than the current minimum for C G [Ct,Ct+i\ by using now available 

LB{E,,{w*c)\wc^,Wc,+t)- 

Algorithm 3 is the pseudo-code of the proposed algorithm along with tricks I and 2. 

There are two additional input parameters to G N and p > I. The former is used for trickl, where 
we initially compute to approximate solutions for regularization parameter values evenly allocated in the 
interval [Ci,Cu] in the logarithmic scale. Trickl is described at lines 2-9 in Algorithm 3. 

The latter p > 1 is used for trick2, where the next regularization parameter value is determined in 
trial-and-error manner. To formally describe trick2, let us define a set T as a function of w in the following 
way 

rfu'-l-f cl / 0‘iwc,x'i} 

^ \a{w^,x'^} + S{g{wc),x'^) Ugv \ I3{w^,xl) + j{g{wc),x[) ) teM 

Then, our initial trial step is written as 

:= (K(TH(£;4u;^J|u;^J-Ar*+pe)J+I)“'(r(u;cJ), (20) 

where p > I represents how far we step forward. We then compute an approximate solution wctmp, 
and obtain a validation error lower bound LB{Ey{wQ)\wQ^,WQtrap) by combining LB{Ey{wQ)\w^^) and 
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Li?(i?t,(wp)|wctmp). For accepting this trial step, we need to make sure that the lower bounds are not 
smaller by e than the current best for any C € [Ct, (7*™^]. To this end, we investigate where the two 

lower bounds LB{Ey{wQ)\wQ^) and LB{Ey{w'^)\wci’^p) go below — £. To formulate this, let us define 
the following two functions 

C^iwciL)) := {[n'{LBiE,iw*c^^^)\wciL)) -F;r‘+^)J+ l)*^r(u;c(L))), (21) 

C\wciR))) := iln'{LB{E,iw*c^j,.^)\wciR)) -F;r‘+£)J+1)™(A(^C(«))), (22) 

where, for the latter, we define 

a(w^,x'i) ^ r P{wc,x'i) 

P{wc,x'i)+l(.9{wc)^x[) Jie-p la(w^,x'i) +S(g(w^),x') hejv’ 

and denote the fc"'"^-largest element of A as fc"'’^(A) for any natural number k. The trial step to (7*™^ is 
accepted if 

C^{wctppp) < 

If not, we need to shrink the trial step by using the procedure described in Algorithm 4. Briefly speaking, 
Algorithm 4 conducts a bisection search until we find two approximate solutions wc(l) and wc(r) that satisfy 
C^{wciL)) < C^{wc{L))- We note that, with the use of trick2, the sequence of the regularization parameter 
values Cl,, Ct is not necessarily in increasing order because they are computed in trial-and-error manner. 


A(u;c) := I 


C Approximate regularization path in terms of validation errors 

In this appendix, we describe the details of approximate regularization path in terms of validation errors 
and its experimental results. 

By slightly modifying the algorithm, we can compute an e-approximate regularization path whose ap¬ 
proximation level is measured in terms of the validation errors. Such an e-approximate regularization path 
is formulated as a function 


W:[Ci,Cu]^'M.‘^,Cp^w, 


such that 


\E,,{W{C))-E,{w*c)\<e,yC €[Ci,Cu]. 

In order to compute IT, we need an upper bound of the validation errors as well as a lower bound represented 
as a function of the regularization parameter. Given a solution wg for a regularization parameter (7, our 
basic idea is to go forward the regularization path as long as the difference between the upper and the lower 
bounds are not greater than s. We note that, the approximation quality of our approximate regularization 
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path is measured in terms of the validation errors, which is more advantageous for hyper-parameter tuning 
tasks than existing approaches [13, 11, 12, 20] in which the approximation quality is evaluated in terms of 
the objective function values. 

We compute a validation error upper bound based on the following simple facts: 

y' = -1-1 and LB{w^x'^\w^) > 0 correctly-classified, (23a) 

y[ = —1 and UB{w'^x^\wq) < 0 correctly-classified. (23b) 

Based on these facts, we have a lemma for validation error upper bounds similar to Lemma 3: 


Lemma 9. For a validation instance with y[ = -\-l, if 

a{w^,x'f) 


C <C < 


-C or 


P{wc,x'i) 


—C<C<C, 


+ i{g{wc), x') a{wc,x'f) F 5{g{wc), x') 

then the validation instance (a:',y') is correctly-classified. Similarly, for a validation instance with y' 

if 


= - 1 , 


C <C< 


Pjwc.x'i) 


-C or 


a(wc,x'i) 


a{w^, x'f) + S{g(wc), x'f) x'f) + i{g{wc), x'f) 

then the validation instance (a;',y') is correctly-classified. 


j-C <C <C, 


This lemma can be easily shown by applying (5) to (23). 

Using Lemma 9, an upper bound of the validation errors is represented as a function of the regularization 
parameter C in the following form. 

Theorem 10. Using an approximate solution for a regularization parameter C, the validation error 
Ey{wf;) for any C > 0 other than C satisfies 


/ 

1 

n 


(24) 


V - P{wc,x[) + ^{g{wa),x[) , 

+ w ,{ (fyi), , c<c<c) 

\a{wc,x[) + 5[g[wc),x[) J 

, V l(c<C< 

V ~ o:{wc,x'i) + 5{g{wc),x'^) ) 

c<c<c)). 


E ^ 


Theorem 10 is a direct consequence of Lemma 9. 


C.l Algorithm 

Algorithm 5 is the pseudo-code of our approximate regularization path. 
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Figure 5: An illustrative image of tack¬ 
ing the approximate regularization path in 
terms of validation errors algorithm behav¬ 
ior. The blue real lines and red lines repre¬ 
sent the validation error lower and upper 
bounds, respectively. The green dashed 
lines indicate the difference between the 
validation error lower and upper bounds. If 
the green dashed is greater than or equal 
to £ , we miss tracking e approximation 
path. In such a case, the algorithm com¬ 
putes the next approximate solution, and 
update the validation error lower and up¬ 
per bounds based on the new approximate 
solution. 


The main difference between Algorithm 2 and Algorithm 5 is in how to determine the next regularization 
parameter value Ct+i- For tracking an approximate solution path, we need to find the smallest Ct+i > C 
such that the difference between the upper and the lower bounds UB{Ey{wQ)\wc) — LB{Ey{w'^)\wc) is 
greater than or equal to £. To formulate this, let us define 


r' := {i G [n']\yl = +l,LB{w}: x'^\w^J > 0},A/'' := {i G K]|j/- = -l,UB{wy^Xi\w^^) < 0}. 


and 


Mwc,) :={; 




—A} r'^l- 






■a{wc^,x'^) + 6{g{w^J,x[) J iePuM' ^x') + x') J ieMuP'’ 

Then, Ct+i that meets the above requirement is formulated as 


a+i ^ (ln'(LB(E,(w*^^)lw^^) - UB(EUw*^JIw^J + £)J + 1)“'(A). (25) 

Figure 5 depicts how Ct+i is determined. 

Using the output of Algorithm 5 , our approximate regularization path is written as 


W-.[CuCu]^^\ 


where 


1 if Cg[Q,Q+i), 

Mc^,Ci+^){C) = < 

0 if C^[Q,Q+i). 
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In approximate regularization path computation, we need a special treatment in a pathological situation 
that the signs of the scores of multiple validation instances change at one time at a regularization parameter 
value C. Such a pathological situation is formally stated as follows. Let 

n := {i& [n']\y[ = = 0} U {i e [n']\y[ = -l,U = 0}. 

Then, if the size of 17 is greater than 

{[v!{LB{E^( wq^)\wc^) - UB{E^(w*^^)\wc^) + e)J + 1), 

Algorithm 5 does not work properly. Although such a pathological situation can be considered as an excep¬ 
tional case and treated by tedious book-keeping operations, in the following experiments, we simply add an 
constraint that Ct+i — Ct> 10 “®. 

C.2 Experiments 

Here, we describe the experimental results on approximate regularization path computation. The experi¬ 
mental setup is same as that in §5. Since we cannot use speed-up tricks here, we have two algorithm options. 
In the first option (op4), we used optimal solutions }ie[T] for computing CV error lower bounds. In 
the second option (op5), we instead used approximate solutions {fo( 5 j}tg[T]- Table 3 shows the experimental 
results. Compared with the results in Table I, we needed to solve more optimization problems (denoted 
as T) and hence the total computational cost is larger than simply finding an e-approximate regularization 
parameter. For large datasets D9 and DIO with £ = 0, we could not finish the computations within 100 
hours. 
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Table 3: Complexities and computational costs of approximate regularization path computation experiments. 
For each of the three options and e G {0.10,0.05,0.01,0}, the number of optimization problems solved 
(denoted as T) and the total computational costs (denoted as time) are listed. Note that, for op5, there are 
no results for £ = 0. For D9 and DIO with £ = 0, we could not finish the computations within 100 hours. 


s 


op4 

(using Wg) 

op5 

(using lig) 


op4 

(using Wg) 

op5 

(using w^) 

T 

time 

(sec) 

T 

time 

(sec) 

■ 

T 

time 

(sec) 

T 

time 

(sec) 

0.10 

D1 

91 

0.208 

96 

0.073 


238 

4.828 

240 

1.691 

0.05 

150 

0.284 

180 

0.118 


503 

9.185 

507 

3.518 

0.01 

698 

1.063 

2095 

0.597 


2332 

31.17 

3300 

17.16 

0 

6960 

7.983 

N.A. 


74767 

836.7 

N.A. 

0.10 

D2 

504 

0.367 

510 

0.246 


732 

18.56 

742 

10.49 

0.05 

902 

0.563 

982 

0.444 


1316 

31.77 

1385 

18.88 

0.01 

4549 

2.711 

9404 

2.365 


5820 

118.4 

7700 

76.80 

0 

94612 

68.31 

N.A. 


1583578 

43212 

N.A. 

0.10 

D3 

175 

1.739 

186 

0.592 


227 

1.991 

229 

1.410 

0.05 

314 

2.615 

374 

1.005 


469 

3.987 

475 

2.872 

0.01 

1329 

9.360 

3248 

3.409 


2382 

17.95 

2385 

14.75 

0 

56123 

292.3 

N.A. 


397801 

5481 

N.A. 

0.10 

D4 

84 

0.472 

86 

0.201 


352 

844.0 

357 

302.6 

0.05 

156 

0.798 

162 

0.355 


717 

1209 

725 

624.4 

0.01 

710 

2.816 

1218 

1.497 


3741 

4985 

11631 

11185 

0 

14833 

48.06 

N.A. 


> lOOh 

N.A. 

0.10 

D5 

136 

0.527 

138 

0.185 


189 

145.5 

200 

45.18 

0.05 

283 

0.936 

286 

0.368 


262 

203.7 

272 

61.07 

0.01 

1561 

3.840 

2306 

2.086 


832 

524.7 

851 

179.7 

0 

50101 

104.9 

N.A. 


> lOOh 

N.A. 


D Adaptation to cross-validation setup 

All the methods presented above can be straightforwardly adapted to a cross-validation (CV) setup. Consider 
/c-fold CV where n instances are divided into k disjoint subsets with almost equal size. Let w{k)q 

be the optimal solution trained without using the instances in Then, the /c-fold CV error is dehned as 

EkCviC) ^ X! X! I< 0)> 

where, note that, the CV error is not a function of w, but a function of C. Our algorithm can find an 
e-approximate regularization parameter at which the /c-fold CV error is guaranteed to be no greater by e 
than the smallest possible fc-fold CV error. For each of the k folds, we can compute a validation error lower 
bound as described before. A lower bound of the entire fc-fold CV error can be obtained by simply summing 
them up. 


20 

































































































































































Algorithm 3 : Finding an e-approximate regularization parameter with approximate solutions using tricks 1 

and 2 _ 

Input: {(a;-,y■)}»€[«']) Ci, Cu, e, m, p 

1 : ^ Cl, ^ 1 

2- s <r- ^°Sioi(^C-'^oSioiCi) 
m 

3: for ft, = 0 to m — 1 do 

4- Ch ^ 

5: solve (1) approximately for C = Ch 

6: if then 

7: ^ UB{E^{w*^J\wcJ, ^ Ch 

8: end if 

9; end for 
10: Cm t— Cu , t ■<— 1 
11 : for ft = 0 to m — 1 do 
12 : Ct ■‘r- Ch , ^Ct ^ ^Ch 

13: while Ct < Ch+i do 

14: Set by (20) using 

15: if C*“P > Ch+i then 

16: Set ^‘“P by (22) using 

17: if ^‘“P > Ch+i then 

18: break while loop 

19: end if 

20: end if 

21 : wctnip ■(— solve (1) approximately for C = C^^p 

22: Compute UB{Ey{wQtm.p)\wct«ip ) by ( 8 b). 

23: if UB{Ey{w^tinp)\wct«^p) < then 

24: a))®®* i — UB[Ey(wQtiap)\wctiap) 

25: C'"®®* ^ ^‘“P 

26: end if 

27: r ^ 0 

28: RecursiveCheck(C't, C'^^p, , wc 7 ‘>"p, r) 

29: Ct+r+1 t— C^^P, t— WC‘“P 

30: t ^— t “t" r “t“ 1 

31: end while 

32: end for 

Output: C'*’®®* e C(e). 





Algorithm 4 : RecursiveCheck {C{L),C{R),wc{L),wc{R),r) 
Compute C^{wc{L)) in (21)- 
Compute C^{wc(R)) in (22). 
if C^{wc(R)) < C^{wc{L)) then 
return 
else 

r <— r + 1 

Ct+r ^{C^{wc(R)) + C^{wc{L))) 

'^Ct+r ^ solve (1) approximately for C = Ct+r 
if UB{Er{w*g^^ then 

£;best ^ 

/^best , 

O ^i+r 

end if 

RecursiveCheck(C'(L), Ct+r, wc{L),WQ^^^,r) 

RecursiveCheck(C’t+r, w^^_^^,wc(R:),r) 

end if 


Algorithm 5 : Tracking an e-Approximate Regularization Path 
Input: , \^{xi,yt)}i+[n'], Cij Cu, ^ 

1: t ^ 1, Ct Cl 
2: while Ct < Cu do 

3: Solve (1) approximately at C = Ct and obtain 

4; Set Ct+i by (25) 

5: t i — t 4“ 1 

6 ; end while 

7; T 

Output: Ci,...,Ct+i, wCi,---,'ihcT 
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